Adjusting for a confounding variable when comparing means
Sometimes you are aware the variable you are comparing, such as reduction in blood pressure, is
influenced by not only a treatment approach (such as drug A compared to drug B), but also by other
confounding variables (such as age, whether the patient has diabetes, whether the patient smokes
tobacco, and so on). These confounders are considered nuisance variables because they have a
known impact on the outcome, and may be more prevalent in some groups than others. If a large
proportion of the group on drug A were over age 65, and only a small proportion of those on drug B
were over age 65, older age would have an influence on the outcome that would not be attributable to
the drug. Such a situation would be confounded by age. (See Chapter 20 for a comprehensive review
of confounding.)
When you are comparing means between groups, you are doing a bivariate comparison, meaning you
are only involving two variables: the group variable and the outcome. Adjusting for confounding must
be done through a multivariate analysis using regression.
Comparing means from sets of matched numbers
Often when biostatisticians consider comparing means between two or more groups, they are thinking
of independent samples of data. When dealing with study participants, independent samples means that
the data you are comparing come from different groups of participants who are not connected to each
other statistically or literally. But in some scenarios, your intention is to compare means from matched
data, meaning some sort of pairing exists in the data. Here are some common examples of matched
data:
The values come from the same participants, but at two or more different times, such as before and
after some kind of treatment, intervention, or event.
The values come from a crossover clinical trial, in which the same participant receives two or
more treatments at two or more consecutive phases of the trial.
The values come from two or more different participants who have been paired, or matched, in
some way as part of the study design. For example, in a study of participants who have
Alzheimer’s disease compared to healthy participants, investigators may choose to age-match each
Alzheimer’s patient to a healthy control when they recruit so both groups have the same age
distribution.
Comparing means of matched pairs
If you have paired data, you must use a paired comparison. Paired comparisons are usually handled by
the paired student t test that we describe later in this chapter under “Surveying Student t tests.” If your
data aren’t normally distributed, you can use the nonparametric Wilcoxon Signed-Ranks test instead.
The paired Student t test and the one-group Student t test are actually the same test. When you
run a paired t test, the statistical software first calculates the difference between each pair of
numbers. If comparing a post-treatment value to a pretreatment value, the software would start by
subtracting one value from the other for each participant. Finally, the software would run a test to
see if those mean differences were statistically significantly different from the hypothesized value